Add ADR for large message chunking in MQTT protocol #607

maximsemenov80 · 2025-03-17T22:10:03Z

ADR for Large Message Chunking

Transcript March 24, 2025

Copilot

Pull Request Overview

This pull request introduces an Architectural Decision Record (ADR) detailing the proposed implementation of large message chunking in the MQTT protocol. The document outlines the context, decision rationale, protocol flow, benefits, and implementation considerations for handling oversized MQTT messages.

Introduces a new ADR document.
Describes the protocol flow for both sending and receiving large messages.
Highlights implementation considerations including error handling, performance optimization, and security.

doc/dev/adr/0020-large-message-chunking.md

varunpuranik · 2025-03-24T17:37:39Z

doc/dev/adr/0020-large-message-chunking.md

I feel a key question we need to answer is - when can the client chunk, vs not.
This will depend on whether the receiver is able to understand our chunking protocol.
So we need to enable this only when both sides of the communication pipe use the same mechanism. One example is mRPC. Telemetry can also be applicable, but I suppose telemetry could also be asymmetric?

doc/dev/adr/0020-large-message-chunking.md

Co-authored-by: Valerie Avva Lim <[email protected]>

doc/dev/adr/0020-large-message-chunking.md

Co-authored-by: Tim Taylor <[email protected]>

doc/dev/adr/0020-large-message-chunking.md

…d configuration settings

…documentation

vaavva · 2025-06-10T19:12:13Z

doc/dev/adr/0020-large-message-chunking.md

+The receiving client uses the Message Expiry Interval from the first chunk as the timeout period for collecting all remaining chunks of the message.
+
+Edge case:
+- Since the Message Expiry Interval is specified in seconds, chunked messages may behave differently than single messages when the expiry interval is very short (e.g., 1 second remaining). For a single large message, the QoS flow would complete even if the expiry interval expires during transmission. However, with chunking, if the remaining expiry interval is too short to receive all chunks, the message reassembly will fail due to timeout.


for other message expiry calculations, we always round partial seconds up, never down - I imagine doing something similar here should maintain acceptable behavior.

To your statement about the QoS flow would (not) complete for the chunking scenario, what exactly do you mean by this? Just that the message might not be received by the end application if all chunks aren't delivered in time, or is there some ramification about the QoS flow not completing - we would definitely still need to ack all messages for the chunking scenario

it means it does not matter if the single message expires during QoS flow execution it will complete (even if it involves whole message resend), but in case of chunks at the very end of expiration interval it is possible some tail chunks would expire before their transfer started thus creating effect of the whole (original) message expires midflight and transfer canceled.

vaavva · 2025-06-10T19:22:14Z

doc/dev/adr/0020-large-message-chunking.md

+        Note over Receiver: Message Expiry Interval exceeded
+        Receiver->>Receiver: Timeout occurred
+        Receiver->>Receiver: Cleanup buffers
+        Note over Receiver: Notify application:<br/>ChunkTimeoutError


I think for these error cases, we would just log the error and ignore the received message, similar to any other invalid message that we've received. There's no action the application can take

vaavva · 2025-06-10T19:30:28Z

doc/dev/adr/0020-large-message-chunking.md

+        Note over Receiver: Message Expiry Interval exceeded
+        Receiver->>Receiver: Timeout occurred
+        Receiver->>Receiver: Cleanup buffers
+        Note over Receiver: Notify application:<br/>ChunkTimeoutError


we will need some new rpc errors for communicating chunking issues to the invoker in the command response

(bad example as this one would be past when the invoker is listening, but for the buffer size full there should be some communication)

doc/dev/adr/0020-large-message-chunking.md

doc/dev/adr/0023-large-message-chunking.md

jspaith · 2025-07-10T23:48:39Z

doc/dev/adr/0023-large-message-chunking.md

+
+**Chunk size calculation:**
+
+- Maximum chunk size will be derived from the MQTT CONNECT packet's Maximum Packet Size.


I like having the maximum size based off CONNACK.

Lower prio - it maybe OK to punt in v1 - but you should think whether you want this to also be configured via a knob, where the size would be min(CONNACK calculation, knob setting).

The scenario is that we could imagine say a low-end device that only had some small amount of RAM, where we only want to send is smaller chunks even if the MQTT broker allowed larger sizes.

doc/dev/adr/0023-large-message-chunking.md

jspaith · 2025-07-10T23:50:10Z

doc/dev/adr/0023-large-message-chunking.md

+**The chunking mechanism will**:
+
+- Be enabled/disabled by a configuration setting.
+- Use standardized user properties for chunk metadata. The `__chunk` user property will contain a colon-separated string with chunking metadata: ```<messageId>:<chunkIndex>:<totalChunks>:<checksum>```. The string will include:


Speaking of versions - should this include a version in it of the chunking version?

How does mRPC command/response handle versions of the mRPC protocol layer itself (which is different than the customer mRPC client/servers having the version field)? Can we leverage those concepts?

As one case I'm imagining, we could imagine(!) a v1 of this protocol requires sender to know size up front but then we relax in a v2 to support streaming.

jspaith · 2025-07-10T23:50:28Z

doc/dev/adr/0023-large-message-chunking.md

+  - `messageId` - UUID string in the 8-4-4-4-12 format, present for every chunk.
+  - `chunkIndex` - unsigned 32 bit integer in decimal format, present for every chunk;
+  - `totalChunks` - unsigned 32 bit integer in decimal format, present only for the first chunk.
+  - `checksum` - SHA-256 hash in hexadecimal format (64 characters long), present only for the first chunk.


Does anywhere else in mRPC use a checksum?

If yes, then makes sense to keep here I guess.

If no, do we need it here? We rely on TCP to do checksum checks for us. And this checksum would still only check the payload, the headers could (in theory) get corrupted still.

I agree, integrity check that travels with the data it's protecting can be spoofed by an attacker who can modify both.

jspaith · 2025-07-10T23:51:25Z

doc/dev/adr/0023-large-message-chunking.md

+**Configuration settings:**
+- Enable/Disable
+
+### Implementation Considerations


Outside the scope of this ADR and potentially post-2510 even. But it's on my mind so recording here.

We need to think about who the customers of this can/should be.
If SDKs do most of work it should be an easy discussion. But on my mind:

Schema Registry team - maybe. Even on a tiny configuration size, max message is still 4MB which I think should cover vast majority of schemas. Though maybe not - OPC UA schemas get big.

Tinykube | WASM - 100%! These WASM modules can be really large - like dozens of MB. And since both the Tinykube server and client are owned by Microsoft, should be really easy to add this.

Dataflows - ??. I don't know actually since I don't have a sense of how large messages we're getting.

OPC UA / connectors - ??. It depends on what we do with Dataflows as Dataflows is one of their main customers.

jspaith · 2025-07-10T23:53:10Z

doc/dev/adr/0023-large-message-chunking.md

+
+```mermaid
+sequenceDiagram
+    participant Sender as Sending Client


Retry on the sender?

It looks like the sending client has no way of knowing whether its upload was actually processed by the receiver.

This is the same as any MQTT PUBLISH / telemetry of course - getting a PUBACK only tells PUBLISHER so much.

The question is - is that OK for this case? Or do we want to let clients be more robust?

So some sort of mRPC type response-topic construct - maybe sent in the 1st message - where the caller can indicate success|failure?

We should understand scenarios though - we may indeed not need this.

Like for Tinykube, a client will be initiating "please TK server download for me this giant WASM module" and then the TK client will listen for chunked-response.

So if chunked-response doesn't come in time, then TK client would quest retry the "give me WASM module" rather than relying on it to retry.

So something to think about, and if we don't think we need it I think its worth calling out in design that we don't need it & why.

Given that QoS for the original message will be applied to all chunks wouldn't that give us needed control over delivery guarantee?

There's always possibility of timeout - I think MQ sets messages with 24 hour timeouts by default though I may be misremembering that.

Or we could imagine a client saying "timeout = 2 minutes" for say the entire operation, since maybe if this fails they want to retry or at least signal an error to the caller / user in a more timely manner? For that I think you need an message to a response-topic on initiator.

I'm a bit confused after our last meeting (now when chunking is unconditional and automatic) say I know my message is not going to be chunked then QOS give me all I need, on the other hand if I know that my message will be chunked, RPC with underlying chunking MQTT client gives desired level of the delivery control on the sender side (probably with some extended information on specific chunking failure in the InnerException):

RPC calls mqttClient.PublishAsync(completeMessage)

Chunking layer splits into chunks and sends them

Something goes wrong with chunking (partial failure)

RPC only gets the final result: success or failure of the entire message

If I understand the question correctly, the answer is that the receiving side's chunking layer would time out waiting for the final chunk of the RPC call and would not notify the user that any RPC was invoked. The RPC layer would only "count" an RPC call that has received all chunks successfully

doc/dev/adr/0023-large-message-chunking.md

koepalex · 2025-10-20T07:00:49Z

doc/dev/adr/0023-large-message-chunking.md

+
+## Context
+
+The MQTT protocol has inherent message size limitations imposed by brokers and network constraints. Azure IoT Operations scenarios often require transmitting payloads that exceed these limits (e.g., firmware updates, large telemetry batches, complex configurations). Without a standardized chunking mechanism, applications must implement their own fragmentation strategies, leading to inconsistent implementations and interoperability issues.


You mean the ~256 MB message size limit?

Copilot AI review requested due to automatic review settings March 17, 2025 22:10

maximsemenov80 requested a review from ryanwinter as a code owner March 17, 2025 22:10

Copilot AI reviewed Mar 17, 2025

View reviewed changes

clecompt-msft reviewed Mar 17, 2025

View reviewed changes

ryanwinter suggested changes Mar 18, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

vaavva reviewed Mar 18, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

vaavva reviewed Mar 18, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

vaavva reviewed Mar 18, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

varunpuranik reviewed Mar 24, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

varunpuranik reviewed Mar 24, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

cartertinney reviewed Apr 10, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

maximsemenov80 and others added 8 commits May 13, 2025 09:56

Add ADR for large message chunking in MQTT protocol

8644e85

accept suggestion

6de0c8e

Co-authored-by: Valerie Avva Lim <[email protected]>

accept suggestion

75a767c

Co-authored-by: Valerie Avva Lim <[email protected]>

Address review comments

e284083

progress

8c29c44

progress

2251ff3

Address coments to large message chunking implementation details

a8a9cc1

progress

4967150

maximsemenov80 force-pushed the maxim/chunking branch from cc206a6 to 4967150 Compare May 13, 2025 16:56

timtay-microsoft reviewed May 14, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

timtay-microsoft reviewed May 14, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

timtay-microsoft reviewed May 14, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

Update doc/dev/adr/0020-large-message-chunking.md

0af2742

Co-authored-by: Tim Taylor <[email protected]>

timtay-microsoft reviewed May 21, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

timtay-microsoft reviewed May 21, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

timtay-microsoft reviewed May 21, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

cartertinney reviewed May 21, 2025

View reviewed changes

olivakar reviewed May 21, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

vaavva reviewed May 30, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

Address review comments

cc724bb

maximsemenov80 requested a review from avishekpant as a code owner June 5, 2025 22:09

cartertinney reviewed Jun 6, 2025

View reviewed changes

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

doc/dev/adr/0020-large-message-chunking.md Outdated Show resolved Hide resolved

maximsemenov80 and others added 6 commits June 6, 2025 14:29

progress

11de62b

progress

0bb4de6

Enhance large message chunking documentation with failure handling an…

18b36fe

…d configuration settings

Clarify handling of message chunks for non-chunking-aware clients in …

833774f

…documentation

Update 0020-large-message-chunking.md

0199e10

Update 0020-large-message-chunking.md

448f8e2

vaavva reviewed Jun 10, 2025

View reviewed changes

maximsemenov80 and others added 3 commits June 10, 2025 16:40

Update 0020-large-message-chunking.md

aa8e946

Update 0020-large-message-chunking.md

8e9c760

rename file

e323780

maximsemenov80 requested a review from RLeclair as a code owner July 9, 2025 20:57

maximsemenov80 added 3 commits July 9, 2025 14:49

Update 0023-large-message-chunking.md

f6894f8

Update 0023-large-message-chunking.md

98baf9b

Update 0023-large-message-chunking.md

f95fe5c

jspaith reviewed Jul 10, 2025

View reviewed changes

Update 0023-large-message-chunking.md

788dc3d

koepalex reviewed Oct 20, 2025

View reviewed changes


		Chunk size calculation:

		- Maximum chunk size will be derived from the MQTT CONNECT packet's Maximum Packet Size.


		## Context

		The MQTT protocol has inherent message size limitations imposed by brokers and network constraints. Azure IoT Operations scenarios often require transmitting payloads that exceed these limits (e.g., firmware updates, large telemetry batches, complex configurations). Without a standardized chunking mechanism, applications must implement their own fragmentation strategies, leading to inconsistent implementations and interoperability issues.

Add ADR for large message chunking in MQTT protocol #607

Are you sure you want to change the base?

Add ADR for large message chunking in MQTT protocol #607

Uh oh!

Conversation

maximsemenov80 commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maximsemenov80 Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maximsemenov80 commented Mar 17, 2025 •

edited

Loading

maximsemenov80 Jul 18, 2025 •

edited

Loading

timtay-microsoft Aug 6, 2025 •

edited

Loading